Preprocessing is usually associated with the C language. However, there is no reason why it cannot be applied elsewhere, either in other languages, or data files, or even in text.
The m1 preprocessor, written by Jon Bentley ( see [http://nero.deg.net/docs/OReilly.The.Unix.CD.Bookshelf.v3.0.Retail-EAT/sedawk/ch13_10.htm] ), is an example of a general-purpose preprocessor which will deal with text. It is written in AWK, which can be VERY confusing to the newcomer.
A preprocessor is a tiny interpreter with a limited vocabulary. It is run between editing the source code, and running the language interpreter or compiler. It is used to add functionality to the original language.
B-Prep will start off simple, and get more complex as time goes on.
File: BPREP-1.BAS
We'll start this very simply, a program to read in a line, and then write it out. We'll use subroutines and functions to allow us to be as simple or complex as we desire: when we start, we'll just feed lines in from the console for immediate feedback. Once we've got the preprocessor working, we'll switch to reading from a file, and then we can use a Windows-type file selector.
'====== GLOBALS =======
global TRUE, FALSE, TRUE$, FALSE$, RawData$
global COMMENT$
FALSE = 0
TRUE = -1
FALSE$ = "0"
TRUE$ = "-1"
COMMENT$ = "'"
call InitializeSystem
[Top]
call GetNextLine ' Loads RawData$
call write RawData$
goto [Top]
end
'===================================================
'---------------------------------------------------
sub write aString$
print aString$
end sub
'---------------------------------------------------
sub InitializeSystem
call write "BASIC Preprocessor"
call write ""
end sub
' Note that we eat up blank lines here - there's no reason for us to try to analyse a blank line.
' This is a matter of preference: take out the DO and LOOP lines if you want to keep blank lines in your output file.
'---------------------------------------------------
sub GetNextLine
' Eats up blank lines
do
input RawData$
loop while RawData$ = ""
end sub
Enter the code, and run it -- make sure there were no typos. Not very exciting, yet, but it is the basis for all we'll do.
File: BPREP-2.BAS
The first command will be "#define". Note that I am using the "pound sign" ("octothorpe" if you want to impress people) character (#) to identify the preprocessor commands. I could have used almost any other character, but I've worked a lot with C, and my fingers find "#define" easier to type. Also, we're just testing to see if a name is defined. Testing for values will come later.
We'll need to do a little more modification of the input line first, however. How will we handle a line like:
We can assume that anything after the necessary part ("#define Rich") is garbage and ignore it -- this is what M1 does. Or, we can add a trimmer function to remove comments. Let's do the trimmer, but in such a way as it can be removed later.
Change GetNextLine to this:
'---------------------------------------------------
sub GetNextLine
' Eats up blank lines
do
input RawData$
Du = instr(RawData$, COMMENT$)
if Du <> 0 Then
RawData$ = left$(RawData$,Du-1)
end if
RawData$ = RTrim$(RawData$)
loop while RawData$ = ""
end sub
and add the new Globals:
DIM Definition$(1000) '====== GLOBALS ======= global TRUE, FALSE, TRUE$, FALSE$, RawData$ global COMMENT$, TAB$, MAXDEFS global Definition$, NextDef FALSE = 0 TRUE = -1 FALSE$ = "0" TRUE$ = "-1" COMMENT$ = "'" TAB$ = CHR$(9) MAXDEFS = 1000 NextDef = 1
I've also added a function to trim blank spaces from the end of the input line.
'--------------------------------------------------- function RTrim$(AString$) Du = len(AString$) do Ch$ = mid$(AString$,Du,1) if (Ch$ = " ") or (Ch$ = TAB$) then Du = Du - 1 else exit do end if loop while Du > 0 RTrim$ = left$(AString$,Du) end function
Now we need to analyze the input string. Add a call to Analyze right after the call to GetNextLine in the main loop, and add the following subroutines:
'--------------------------------------------------- sub Analyze if instr(RawData$,"#") <> 0 then ' Trim any leading spaces PPCmdStr$ = trim$(RawData$) ' make sure the # is first in the line if left$(PPCmdStr$,1) = "#" then ' Get the command PPCmd$ = word$(PPCmdStr$,1) ' Trim it PPCmdStr$ = trim$(mid$(PPCmdStr$,len(PPCmd$)+1)) select case PPCmd$ case "#define" call DoDefine PPCmdStr$ case "#undefine" call DoUndefine PPCmdStr$ ' For Testing case "#end" end end select end if else call write RawData$ end if end sub '--------------------------------------------------- sub DoDefine TheName$ Definition$(NextDef) = TheName$ Du = IsDefined(TheName$) if Du = 0 then NextDef = NextDef + 1 if NextDef > MAXDEFS then Call DoError "Too many #defines" end if else Call DoError TheName$+" is already #defined" end if end sub
While we're using a console for input, we'll add a stop condition -- #end. Typing in #end will end the program, which is a little classier than pressing the [X] box of the console window.
We do not want to define something several times, so we do a test to ensure that there is no definition for that name already. This means we need error-handling, so we'll need to add a subroutine for that. It is better to inform the user of what went wrong than to have the base language produce the error message. Because if that happens, the user will assume that there is something wrong with YOUR code, not theirs!
'--------------------------------------------------- function IsDefined(AName$) Du = NextDef-1 while Du > 0 if Definition$(Du) = AName$ then exit while end if Du = Du - 1 wend IsDefined = Du end function '--------------------------------------------------- sub DoElse do call GetNextLine Du$ = word$(RawData$,1) loop until Du$ = "#endif" end sub
If you run this with Debug in Liberty BASIC, you should see Definition$ growing.
Since we've done "#define", we may as well do "#undefine". The C preprocessor uses "#undef", but that means too much typing when I want to use #define/undef to turn a feature on and off: I need to add "un", then go over and delete "ine" to turn it off, and reverse the process to turn it on. Had someone thought, they would have realized just typing "un" (or removing two characters) is a lot easier than the previous contortions. Even though it's a tad faster to type "#undef" than "#undefine", my trained fingers often type in "#undefine" and I have to go back and change it any way!
The first thing we'll need to do is see if the name is already defined, and if so, remove it. If we stop there, however, we could have 1000 definitions and 1000 "undefinitions," and the next time we try to do a define, we'll get an error. We'll need to add some garbage collection.
'--------------------------------------------------- sub DoUndefine TheName$ Du = IsDefined(TheName$) if Du > 0 then ' Compact the array while Du < NextDef Definition$(Du) = Definition$(Du+1) Du = Du + 1 wend NextDef = NextDef - 1 end if end sub
Right after the "#define" case in analyze, add this:
case "#undefine" call DoUndefine PPCmdStr$
File: BPREP-3.BAS
Now we need to do something with all those lovely definitions. We'll try conditional output: IF a value is defined, then send out the lines until ENDIF is found. If it isn't defined, just gulp down the lines until ENDIF is found.
We are starting simple: no nesting allowed! Add the detectors to Analyze:
case "#ifdef" call DoIfDef PPCmdStr$ case "#endif" call DoEndIf
At first, if the value is undefined, just gobble up lines until #endif is found:
'--------------------------------------------------- sub DoIfDef TheName$ Du = IsDefined(TheName$) if Du = 0 then do call GetNextLine Du$ = word$(RawData$,1) loop until Du$ = "#endif" end if end sub
EndIf is even simpler:
'--------------------------------------------------- sub DoEndIf ' Do nothing end sub
Test this code, and make sure it works. M1 does not have an "else" keyword. Instead, it uses "if" and "unless", with "unless" operating as "ifnot". It's OK, but I find it awkward to use. Let's add an "#else" keyword. For symmetry, put the detector just before "#endif":
case "#ifdef" call DoIfDef PPCmdStr$ case "#else" call DoElse case "#endif" call DoEndIf
Now things will get interesting. In the simple case of #if's value being undefined, we can treat #else as another form of #endif -- as an "end-of-gobble" marker. But if #if's value is defined, then the code following #if is being analyzed, and #else must work as #ifnot. So, first add another test to DoIfDef:
'--------------------------------------------------- sub DoIfDef TheName$ Du = IsDefined(TheName$) if Du = 0 then do call GetNextLine Du$ = word$(RawData$,1) if Du$ = "#else" then exit do end if loop until Du$ = "#endif" end if end sub
and add DoElse:
'--------------------------------------------------- sub DoElse do call GetNextLine Du$ = word$(RawData$,1) loop until Du$ = "#endif" end sub
This is using a bit of a trick: if #if's value is undefined, DoIfDef will gobble up the #else, if one exists, and stop gobbling. Analyze will continue with the next line, the first of the #else section. If #if's value is defined, the lines following #if will be left for Analyze, and #else will not be gobbled. When Analyze comes across it, it will call DoElse, which will gobble until #endif is found.
Time to check this out. Note that #else and #endif do not check to see if they are part of an #if statement, so if you wanted to be wild, you could use #else-#endif to comment out blocks of code. But this would be considered Bad Form, and in general, a Very, Very Bad Idea (VVBI).
Go to Part 2
Files
The file PreProcess.zip is included in the zipped archive of this newsletter.